Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis.
نویسنده
چکیده
Robust cancer molecular pattern identification from microarray data not only plays an essential role in modern clinic oncology, but also presents a challenge for statistical learning. Although principal component analysis (PCA) is a widely used feature selection algorithm in microarray analysis, its holistic mechanism prevents it from capturing the latent local data structure in the following cancer molecular pattern identification. In this study, we investigate the benefit of enforcing non-negativity constraints on principal component analysis (PCA) and propose a nonnegative principal component (NPCA) based classification algorithm in cancer molecular pattern analysis for gene expression data. This novel algorithm conducts classification by classifying meta-samples of input cancer data by support vector machines (SVM) or other classic supervised learning algorithms. The meta-samples are low-dimensional projections of original cancer samples in a purely additive meta-gene subspace generated from the NPCA-induced nonnegative matrix factorization (NMF). We report strongly leading classification results from NPCA-SVM algorithm in the cancer molecular pattern identification for five benchmark gene expression datasets under 100 trials of 50% hold-out cross validations and leave one out cross validations. We demonstrate superiority of NPCA-SVM algorithm by direct comparison with seven classification algorithms: SVM, PCA-SVM, KPCA-SVM, NMF-SVM, LLE-SVM, PCA-LDA and k-NN, for the five cancer datasets in classification rates, sensitivities and specificities. Our NPCA-SVM algorithm overcomes the over-fitting problem associative with SVM-based classifications for gene expression data under a Gaussian kernel. As a more robust high-performance classifier, NPCA-SVM can be used to replace the general SVM and k-NN classifiers in cancer biomarker discovery to capture more meaningful oncogenes.
منابع مشابه
Identification of Prognostic Genes in Her2-enriched Breast Cancer by Gene Co-Expression Net-work Analysis
Introduction: HER2-enriched subtype of breast cancer has a worse prognosis than luminal subtypes. Recently, the discovery of targeted therapies in other groups of breast cancer has increased patient survival. The aim of this study was to identify genes that affect the overall survival of this group of patients based on a systems biology approach. Methods: Gene expression data and clinical infor...
متن کاملNonnegative Principal Component Analysis for Proteomic Tumor Profiles
Identifying cancer molecular patterns with high accuracy from high-dimensional proteomic pro les presents a challenge for statistical learning and oncology research. In this study, we develop a nonnegative principal component analysis and propose a nonnegative principal component analysis based support vector machine with a sparse coding to conduct e ective feature selection and high-performanc...
متن کاملEthanol and Cancer Induce Similar Changes on Protein Expression Pattern of Human Fibroblast Cell
Abstract Ethanol has a vast consumption around the world. Many researches confirmed some adverse effect of this component on human health. In addition, recent studies showed significant alteration in both cellular population, and protein profile of human foreskin fibroblast cell line (HFFF2) in the specific dosage of ethanol. Here, the role and interaction of some proteins (characterized by sig...
متن کاملEthanol and Cancer Induce Similar Changes on Protein Expression Pattern of Human Fibroblast Cell
Abstract Ethanol has a vast consumption around the world. Many researches confirmed some adverse effect of this component on human health. In addition, recent studies showed significant alteration in both cellular population, and protein profile of human foreskin fibroblast cell line (HFFF2) in the specific dosage of ethanol. Here, the role and interaction of some proteins (characterized by sig...
متن کاملMachine Learning Techniques for Thyroid Cancer Diagnosis
Drawing inspiration from Alexander’s paper on classification of thyroid cancer, we are interested in replicating and possibly improving the predictive results of a learning model for detecting thyroid cancer from gene expression data from thyroid nodules. This data set is the same data used in the paper by Alexander. We will develop our own gene expression classifier by applying different featu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Genome informatics. International Conference on Genome Informatics
دوره 21 شماره
صفحات -
تاریخ انتشار 2008